Search CORE

3 research outputs found

Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets: A Survey

Author: Biswal Biswajit
Muallem Asmah
Pan Jan W.
Shetty Sachin
Zhao Juan
Publication venue: ODU Digital Commons
Publication date: 01/01/2017
Field of study

This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Trees, surveying ensembles of Hoeffding Trees and surveying existing techniques using Hoeffding Trees for anomaly detection. These categories are referred to as compositions within this paper and were selected based on their relation to streaming data and the flexibility of their techniques for use within different domains of streaming data. We discuss the relevance of how combining the techniques of the proposed research works within these compositions can be used to address the anomaly detection problem in streaming cyber datasets. The goal is to show how a combination of techniques from different compositions can solve a prominent problem, anomaly detection

Old Dominion University

Visualizing geolocation of spam email

Author: Muallem Asmah
Publication venue: Digital Scholarship @ Tennessee State University
Publication date: 01/01/2012
Field of study

Viruses and phishing scams, as a result of spam, are increasingly becoming numerous. Spontaneous methods used by spammers present a threat in spam prevention. Tools for spam identification and prevention are increasing but lack presentation fundamentals. A primary concern is lack of tools to effectively analyze spammer location information from online databases. A security visualization framework based on the integration of MaxMind and WhoIS databases; and the Google Maps API is developed in this thesis. The security visualization framework provides a central one stop location for visualizing spam email origination and activity. An extensible framework with the capability for additional resources for further analysis is developed. Reducing time spent by network analysts for spam analysis was the focus of this work. Requirements for the system and each subsystem were constructed along with the consideration of alternatives for each subsystem. Requirements were validated through the testing of the system. Requirements overall focused on system ease of use and time reduction in the spam analysis process. Development and implementation integrated MaxMind, Who IS, and raw real-time spam emails to provide a visualization of spam origination and spam activity using a Google Map, Google Map markers, info windows, and polygons. Three major subsystems were used for the implementation; 1) Data Acquisition Subsystem (collects spam emails for a period of time) 2) Database Design Subsystem (processes spam email, retrieves geographical information and Who IS information while analyzing and storing results) and 3) Visualization Subsystem (retrieves processed data from database and manipulates Google map controls to display visualization). Testing of the system identified spammer spatial patterns such as spammers distributing spam from one location and one registered ISP or host using different email addresses. Testing also identified locations were there was a variance in spam such as regions where multiple spam emails were distributed from. Extra time and more data can provide temporal patterns by spammers. A security visualization tool with a user friendly interactive interface integrating common databases was proven effective in determining patterns of spam activity. Providing one tool with multiple functionalities also reduces time spent by network analysts due to the simplicity of comprehending displayed spam activity data. ISPs can benefit from this tool by identifying spam activity initiated through their network and can take the appropriate law measures. Users can also benefit from this tool by visualizing spam activity from their email and take appropriate measures to prevent fraud, phishing scams, or viruses which can harm them

Digital Scholarship @ Tennessee State University

TDDEHT: Threat Detection using Distributed Ensembles of Hoeffding Trees on Streaming Cyber Datasets

Author: Muallem Asmah
Publication venue: Digital Scholarship @ Tennessee State University
Publication date: 01/01/2018
Field of study

In the evolving world of technology, massive streams of diverse data from disparate sources are generated incomparably. Recently, more advanced data stream mining (DSM) machine learning approaches have been proposed to efficiently process this emerging dissemination of data. Most of these researches propose the use of a well-known state-of-the-art classifier, Hoeffding Trees, generally focusing on achieving improved accuracy when exceedingly complex drifts are present. However, only a minor few have explored challenges faced in advanced DSM of anomaly-based network Intrusion Detection Systems (IDS), and frequently validate with outdated cyber datasets, despite the common relation between anomalies and concept-drift. In this paper, we propose an enhanced methodological distributed Hoeffding Tree ensemble framework IDS built on Spark Streaming. Our approach extends an existing machine learning ensemble based approach by combining diverse Hoeffding Trees and producing evaluation metrics to identify the most efficient type of Hoeffding Tree for detecting cyber-attacks, while providing a framework extensible for additional Linear classifiers. To demonstrate the accuracy of our approach, we evaluate using various up-to-date real-world and synthetic cyber-attack and concept-drift datasets from reputable sources. Our experimental results demonstrate that our approach is properly identifying classifiers, while increasing accuracy and supplemental evaluation metrics, with less resources and the reduction of processing speed

Digital Scholarship @ Tennessee State University